Boosting localized binary features for speech recognition
نویسندگان
چکیده
In a recent work, the framework of Boosted Binary Features (BBF) was proposed for ASR. In this framework, a small set of localized binary-valued features are selected using the Discrete Adaboost algorithm. These features are then integrated into a standard HMM-based system using either single layer perceptrons (SLP) or multilayer perceptrons (MLP). The features were found to perform significantly better (when coupled with SLP) and equally well (when coupled with MLP) compared to MFCC features on the TIMIT phoneme recognition task. The current work presents an overview of the idea and extends it in two directions: 1) fusion of BBF with MFCC and an analysis of their complementarity, 2) scalability of the proposed features from phoneme recognition to the continuous speech recognition task and reusability on unseen data.
منابع مشابه
Boosting Localized Features for Speaker and Speech Recognition
In this thesis, we propose a novel approach for speaker and speech recognition involving localized, binary, data-driven features. The proposed approach is largely inspired by similar localized approaches in the computer vision domain. The success of these existing approaches coupled with their proven advantages of robustness and computational efficiency motivated us to apply these ideas to the ...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملImplementing Gender-Dependent Vowel-Level Analysis for Boosting Speech-Based Depression Recognition
Whilst studies on emotion recognition show that genderdependent analysis can improve emotion classification performance, the potential differences in the manifestation of depression between male and female speech have yet to be fully explored. This paper presents a qualitative analysis of phonetically aligned acoustic features to highlight differences in the manifestation of depression. Gender-...
متن کاملBoosting Local Spectro-Temporal Features for Speech Analysis
We introduce the problem of phone classification in the context of speech recognition, and explore several sets of local spectro-temporal features that can be used for phone classification. In particular, we present some preliminary results for phone classification using two sets of features that are commonly used for object detection: Haar features and SVMclassified Histograms of Gradients (HoG).
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012